5 research outputs found

    Development of statistical methods for the analysis of single-cell RNA-seq data

    Get PDF
    Single-cell RNA-sequencing profiles the transcriptome of cells from diverse populations. A popular intermediate data format is a large count matrix of genes x cells. This type of data brings several analytical challenges. Here, I present three projects that I worked on during my PhD that address particular aspects of working with such datasets: - The large number of cells in the count matrix is a challenge for fitting gamma-Poisson generalized linear models with existing tools. I developed a new R package called glmGamPoi to address this gap. I optimized the overdispersion estimation procedure to be quick and robust for datasets with many cells and small counts. I compared the performance against two popular tools (edgeR and DESeq2) and find that my inference is 6x to 13x faster and achieves a higher likelihood for a majority of the genes in four single-cell datasets. - The variance of single-cell RNA-seq counts depends on their mean but many existing statistical tools have optimal performance when the variance is uniform. Accordingly, variance-stabilizing transformations are applied to unlock the large number of methods with such an requirement. I compared four approaches to variance-stabilize the data based on the delta method, model residuals, inferred latent expression state or count factor analysis. I describe the theoretical strength and weaknesses, and compare their empirical performance in a benchmark on simulated and real single-cell data. I find that none of the mathematically more sophisticated transformations consistently outperform the simple log(y/s+1) transformation. - Multi-condition single-cell data offers the opportunity to find differentially expressed genes for individual cell subpopulations. However, the prevalent approach to analyze such data is to start by dividing the cells into discrete populations and then test for differential expression within each group. The results are interpretable but may miss interesting cases by (1) choosing the cluster size too small and lacking power to detect effects or (2) choosing the cluster size too large and obscuring interesting effects apparent on a smaller scale. I developed a new statistical framework for the analysis of multi-condition single-cell data that avoids the premature discretization. The approach performs regression on the latent subspaces occupied by the cells in each condition. The method is implemented as an R package called lemur

    BBF RFC 105: The Intein standard - a universal way to modify proteins after translation

    Get PDF
    This Request for Comments (RFC) proposes a new standard that allows for easy and flexible cloning of intein constructs and thus makes this technology accessible to the synthetic biology community

    The S-palmitoylome and DHHC-PAT interactome of Drosophila melanogaster S2R+cells indicate a high degree of conservation to mammalian palmitoylomes

    No full text
    Protein S-palmitoylation, the addition of a long-chain fatty acid to target proteins, is among the most frequent reversible protein modifications in Metazoa, affecting subcellular protein localization, trafficking and protein-protein interactions. S-palmitoylated proteins are abundant in the neuronal system and are associated with neuronal diseases and cancer. Despite the importance of this post-translational modification, it has not been thoroughly studied in the model organism Drosophila melanogaster. Here we present the palmitoylome of Drosophila S2R+ cells, comprising 198 proteins, an estimated 3.5% of expressed genes in these cells. Comparison of orthologs between mammals and Drosophila suggests that S-palmitoylated proteins are more conserved between these distant phyla than non-S-palmitoylated proteins. To identify putative client proteins and interaction partners of the DHHC family of protein acyl-transferases (PATs) we established DHHC-BioID, a proximity biotinylation-based method. In S2R+ cells, ectopic expression of the DHHC-PAT dHip14-BioID in combination with Snap24 or an interaction-deficient Snap24-mutant as a negative control, resulted in biotinylation of Snap24 but not the Snap24-mutant. DHHC-BioID in S2R+ cells using 10 different DHHC-PATs as bait identified 520 putative DHHC-PAT interaction partners of which 48 were S-palmitoylated and are therefore putative DHHC-PAT client proteins. Comparison of putative client protein/DHHC-PAT combinations indicates that CG8314, CG5196, CG5880 and Patsas have a preference for transmembrane proteins, while S-palmitoylated proteins with the Hip14-interaction motif are most enriched by DHHC-BioID variants of approximated and dHip14. Finally, we show that BioID is active in larval and adult Drosophila and that dHip14-BioID rescues dHip14 mutant flies, indicating that DHHC-BioID is non-toxic. In summary we provide the first systematic analysis of a Drosophila palmitoylome. We show that DHHC-BioID is sensitive and specific enough to identify DHHC-PAT client proteins and provide DHHC-PAT assignment for ca. 25% of the S2R+ cell palmitoylome, providing a valuable resource. In addition, we establish DHHC-BioID as a useful concept for the identification of tissue-specific DHHC-PAT interactomes in Drosophila
    corecore